feat: UAT runner ergonomics + demote fastmcp tool-failure tracebacks by sergeykad · Pull Request #1051 · homeassistant-ai/ha-mcp

sergeykad · 2026-04-24T16:16:37Z

What does this PR do?

Cleans up the UAT runner and story harness along several axes surfaced while debugging a BAT session. No behavior changes to the production MCP server beyond one log-level downgrade.

Server-side

New ToolValidationLogFilter in src/ha_mcp/__main__.py demotes fastmcp's Error validating tool / Error calling tool tracebacks to single-line WARNINGs when the underlying exception is a Pydantic ValidationError or a FastMCPError (tool-raised). Real tool bugs (bare Exception) keep their full stack. This is a local backstop; an upstream fastmcp PR (fix: drop exc_info for expected tool failures, remove unreachable ValidationError PrefectHQ/fastmcp#4029) fixes the same at source.

UAT runner ergonomics (tests/uat/run_uat.py, tests/uat/stories/run_story.py, tests/uat/README.md)

SuggestingArgumentParser: typo-tolerant argparse with difflib suggestions (e.g., --agants → did you mean --agents?).
Stdin TTY guard so run_uat.py with no pipe/file fails fast with a helpful message instead of hanging.
Quick-start section in the README pointing to run_story.py --all.
Commands corrected to uv run python; /v1 suffix requirement called out for LM Studio/Ollama.
Default LOG_LEVEL=WARNING for the spawned MCP subprocess (override with --mcp-env LOG_LEVEL=INFO).

Shared in-process MCP client (tests/uat/_inprocess.py)

One HomeAssistantSmartMCPServer + FastMCP client per agent run, shared across all stories' setup/verify/teardown. Previously rebuilt per phase per story; construction is ~1.5s, savings ~150s on a 50-story run.
verify_ha_checks(mcp_client) now takes the client as a required argument. _mcp_context in verify_story.py was a duplicate and has been removed.
The pytest mcp_client fixture in tests/uat/stories/conftest.py also uses the shared helper (bonus: it now picks up the websocket_manager.disconnect() that the old fixture skipped).

Logging migration (tests/uat/*)

Replaced four duplicated def log() helpers (thin print wrappers) with stdlib logging module usage throughout. Each UAT file uses an explicit logging.getLogger("uat.<module>") so the namespace level filter works regardless of script vs. module invocation.
New tests/uat/_logging.py::configure_cli_logging() — single place that sets root=WARNING / uat.*=INFO. Silences httpx/openai/mcp INFO chatter without suppressing our trace.
Level-promoted: FATAL: ... → logger.critical, ERROR: ... → logger.error/logger.exception, check failures and summary-level story failures → logger.warning.
Stripped the Pydantic errors.pydantic.dev URL footer from client-side validation error echoes (tests/uat/openai_agent.py::_strip_pydantic_url).

Summary correctness

The Summary block in run_story.py now uses _compute_passed (the same logic as the JSONL records written via append_result). Previously it only considered the test-prompt exit code, so stories that failed ha_checks but exited 0 were marked PASS in the summary but FAIL in the JSONL. These now agree.
append_result signature changed: exit_code= → passed= (single source of truth, computed once per story).
Summary now prints total wall time.

Type of change

🐛 Bug fix (summary PASS/FAIL mismatch; fastmcp validation traceback noise)
✨ New feature
📚 Documentation
🔧 Maintenance/refactor (logging migration, shared MCP client)
🧪 Tests only
💥 Breaking change

Testing

I have tested these changes with a LLM agent (one BAT run of s01_automation_sunset_lights against LM Studio verified: clean WARNING logs, no URL, one server construction, correct FAIL summary from _compute_passed)
All automated tests pass (uv run pytest tests/uat/stories/test_verify_story.py — 12/12)
Code follows style guidelines (uv run ruff check — clean)

Checklist

I have updated documentation if needed (UAT README)

- Demote fastmcp tool-validation/FastMCPError tracebacks to one-line WARNING via ToolValidationLogFilter (pending upstream fastmcp PR #4029). - Runner ergonomics: argparse typo suggestions, TTY stdin guard, clearer startup progress, UAT README quick-start, LOG_LEVEL=WARNING default for spawned MCP subprocess. - Share one in-process FastMCP server per agent run (new tests/uat/_inprocess.py) instead of rebuilding per setup/verify phase. Saves ~1.5s per story-phase; ~150s on a 50-story run. - Migrate all UAT scripts from print-based log() to stdlib logging; keep uat.* namespace at INFO while silencing httpx/openai/mcp INFO chatter. - Fix Summary pass/fail: now uses _compute_passed (same logic as JSONL records), so stories that fail ha_checks but exit 0 are correctly marked FAIL. - Drop duplicated client-side validation error echo (server log already shows the failure details).

gemini-code-assist · 2026-04-24T16:16:48Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request improves the ergonomics, performance, and logging reliability of the UAT runner and story harness. By centralizing logging, optimizing MCP client lifecycle management, and refining CLI usability, these changes make the test suite faster and easier to debug without altering production behavior.

Highlights

Logging Improvements: Migrated UAT scripts to use the standard logging module instead of print wrappers, allowing for better control over log levels and namespace filtering. Added a ToolValidationLogFilter to demote noisy Pydantic and FastMCP validation tracebacks to single-line warnings.
UAT Runner Ergonomics: Introduced a typo-tolerant argument parser, added a TTY guard to prevent hanging, and updated the README with quick-start instructions and command corrections.
Performance Optimization: Implemented a shared in-process MCP client for UAT stories, significantly reducing overhead by reusing the server instance across setup, verification, and teardown phases.
Summary Correctness: Unified the PASS/FAIL logic in the story runner to ensure the summary output matches the generated JSONL records, and added total wall-time reporting.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request refactors the User Acceptance Testing (UAT) suite to improve logging, error reporting, and resource efficiency. It introduces a ToolValidationLogFilter to demote noisy tracebacks, standardizes on the Python logging module across CLI tools, and implements a shared in-process FastMCP client context to reduce server startup overhead. Feedback focuses on adhering to the project's requirement for type hints in all function signatures and improving the robustness of the story runner by handling setup failures more gracefully to avoid terminating the entire test suite.

run_start was shadowed inside the per-story loop by a variable tracking the test prompt start (used for session file discovery). Rename the inner variable to prompt_start so the Summary's elapsed calculation reflects the whole run.

kingpanther13

Thanks for the UAT cleanup — the shared MCP client savings, the _compute_passed consolidation, and the unified uat.* logger namespace are all solid wins. A few items worth addressing before merge:

Important

ToolValidationLogFilter is wider than the stated intent (src/ha_mcp/__main__.py:387) — isinstance(err, FastMCPError) also matches AuthorizationError, ResourceError, PromptError, etc. If the goal is "tool-raised, user-visible error", tightening to isinstance(err, ToolError) would match the docstring exactly and avoid silencing a future auth failure.
Filter discards exc_info / exc_text permanently (__main__.py:394-395) — any downstream handler (Sentry, structured logging, file tail) loses the stack after this filter runs. Consider stashing a one-frame summary into record.msg instead of nulling entirely, so minimal debug context survives.
Commit / PR scope is misleading — feat(uat): covers a production logging change that affects the live server, not just UAT. Release-note automation keys off these prefixes. Splitting the log filter into its own feat(internal): demote fastmcp tool-failure tracebacks commit (or retitling the PR) would keep the changelog honest.
Shared MCP client is a single point of failure across stories (tests/uat/stories/run_story.py around line 786) — the prior per-call _mcp_context gave implicit isolation; reusing one client means a story that corrupts the WebSocket can poison every subsequent story in the same agent run. A short try/reconnect guard, or at minimum a docstring note acknowledging the tradeoff, would help future maintainers diagnose intermittent failures.
inprocess_mcp_client finally doesn't reset websocket_manager (tests/uat/_inprocess.py:49-58) — env vars and _settings are restored, but the module-level websocket stays connected to the (possibly-stopped) test container. Consider await websocket_manager.disconnect() in the finally too, symmetrical with the entry path.
Env-var mutation is process-global (_inprocess.py:37-42) — safe under current sequential usage, but worth an explicit docstring line calling out "not safe for concurrent use" so nobody later adds pytest-xdist or parametrized fixtures and hits races.
_run_mcp_steps teardown swallows bare Exception at INFO level (run_story.py around lines 286-290) — a broken websocket during teardown is logged "failed, ignored" at INFO and silently poisons the next story's setup via the shared client. logger.warning (or logger.exception) would be more honest, and the catch could be narrowed to ToolError / expected transport errors.

Nice-to-have

Test coverage for ToolValidationLogFilter — the sibling StatelessSessionLogFilter has tests/src/unit/test_stateless_session_log_filter.py with five targeted cases. Cloning that template for the new filter would lock in the behavior (bare Exception passes through, subclass handling, exc_info=None pass-through, right-logger/wrong-message, etc.) and guard against a future tightening regression.
Lazy imports inside filter() (__main__.py:379-380) — both fastmcp.exceptions and pydantic are loaded long before any log record reaches this filter, so hoisting the imports to module scope removes a per-record sys.modules lookup in the hot path. Low-impact but free.
Docstring clarity in _inprocess.py — "the env-swap and WebSocket disconnect point ha_mcp's module-level settings at the target HA instance" is accurate but a bit opaque. Something like "clearing ha_mcp.config._settings forces the next get_global_settings() call to re-read env; the websocket disconnect tears down any cached client on the previous URL" would be easier on future readers.

Strengths

Filter correctness verified: bare Exception and NotFoundError retain full tracebacks; pydantic.ValidationError is identity-equal to pydantic_core.ValidationError, so both fastmcp log paths are caught.
append_result(passed=False) default is a good failure-closed choice — the prior exit_code=0 default is exactly what caused the Summary/JSONL divergence this PR fixes.
verify_ha_checks(mcp_client) breaking change is safe (single caller, grep-verified).
SuggestingArgumentParser, stdin TTY guard, /v1 suffix note in the README, and the per-agent prompt_start vs run_start split (commit 945a4c8) — all genuine UX improvements.

Happy to iterate on any of these.

…ed client - Narrow ToolValidationLogFilter to ToolError (was FastMCPError) - Add websocket_manager.disconnect() in inprocess_mcp_client finally - Log teardown failures as WARNING (was INFO) with shared-client note - Hoist fastmcp/pydantic imports to module scope - Document shared-client SPoF tradeoff and process-global env mutation - Add unit tests for ToolValidationLogFilter (6 cases)

sergeykad · 2026-04-24T19:57:25Z

Thanks for the review.

Addressed in 8869f42:

Narrowed FastMCPError to ToolError so future auth/resource/prompt errors keep their stacks.
Added a docstring note on the shared-client tradeoff in _inprocess.py. Reconnect guard would mask real WebSocket breakage; keeping fail-loud.
Added symmetric websocket_manager.disconnect() in the finally block.
Added a "not safe for concurrent use" note.
Raised teardown failure log from INFO to WARNING with context that the shared client may be poisoned for the next story. Kept the broad except Exception so unexpected errors still surface rather than being hidden behind a narrow ToolError-only catch.
Added tests/src/unit/test_tool_validation_log_filter.py mirroring the StatelessSessionLogFilter test pattern (6 cases: pydantic demotion, ToolError demotion, bare Exception passthrough, non-ToolError FastMCPError subclass passthrough, wrong logger, no exc_info).
Hoisted fastmcp.exceptions and pydantic imports to module scope.
Rewrote the _inprocess.py docstring to explain the _settings = None and websocket_manager.disconnect() semantics explicitly.

Not addressing:

exc_info nulling is intentional. The structured error info (pydantic .errors() or ToolError message) lands in record.msg, and the filter's whole purpose is to replace the fastmcp/pydantic-internal stack with a single WARNING line. Validation and tool errors are user-input problems, not server bugs that warrant a stack for Sentry.
Retitled the PR so release-note automation sees the correct scope.

kingpanther13

Thanks for the quick turnaround. All the code items land cleanly, the new test_tool_validation_log_filter.py is exactly the right shape (the FutureAuthError case nicely locks in that non-ToolError subclasses keep their stacks), and the rewritten _inprocess.py docstring is a lot clearer.

On the exc_info nulling — fair call. With the filter narrowed to ToolError / pydantic ValidationError and the structured detail folded into record.msg, these really are user-input errors rather than bugs needing a Sentry-grade stack. The docstring now makes that intent explicit, which is what I was really after.

LGTM.

github-actions · 2026-04-24T20:04:20Z

🧪 Your changes are now in the dev channel!

Your PR has been merged to master and is available for testing in the dev channel.

Test your changes before the next stable release (biweekly Wednesday):
📖 Dev Channel Documentation

Quick start

# Run dev version
uvx ha-mcp-dev

# Check version
uvx ha-mcp-dev --version

Docker:

docker pull ghcr.io/homeassistant-ai/ha-mcp:dev
docker run --rm -i \
  -e HOMEASSISTANT_URL=http://your-ha:8123 \
  -e HOMEASSISTANT_TOKEN=your_token \
  ghcr.io/homeassistant-ai/ha-mcp:dev

Found an issue? Please open a new bug report and mention this PR for context.

Sergey added 2 commits April 24, 2026 19:05

feat(uat): include total wall time in story Summary

e21f271

gemini-code-assist Bot reviewed Apr 24, 2026

View reviewed changes

Comment thread tests/uat/_inprocess.py Outdated

Comment thread tests/uat/stories/run_story.py Outdated

Comment thread tests/uat/stories/run_story.py

Comment thread tests/uat/stories/scripts/verify_story.py Outdated

Sergey added 2 commits April 24, 2026 21:37

chore(uat): add type hints on shared MCP client entry points

6fff6f9

sergeykad marked this pull request as ready for review April 24, 2026 18:43

sergeykad requested review from a team and julienld April 24, 2026 18:43

kingpanther13 requested changes Apr 24, 2026

View reviewed changes

sergeykad changed the title ~~feat(uat): runner ergonomics, shared MCP client, logging cleanup~~ feat: UAT runner ergonomics + demote fastmcp tool-failure tracebacks Apr 24, 2026

sergeykad enabled auto-merge (squash) April 24, 2026 19:59

kingpanther13 approved these changes Apr 24, 2026

View reviewed changes

sergeykad merged commit 7c4836b into master Apr 24, 2026
19 checks passed

sergeykad deleted the fix/uat-runner-ergonomics branch April 24, 2026 20:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: UAT runner ergonomics + demote fastmcp tool-failure tracebacks#1051

feat: UAT runner ergonomics + demote fastmcp tool-failure tracebacks#1051
sergeykad merged 5 commits into
masterfrom
fix/uat-runner-ergonomics

sergeykad commented Apr 24, 2026

Uh oh!

gemini-code-assist Bot commented Apr 24, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kingpanther13 left a comment

Uh oh!

sergeykad commented Apr 24, 2026

Uh oh!

kingpanther13 left a comment

Uh oh!

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sergeykad commented Apr 24, 2026

What does this PR do?

Type of change

Testing

Checklist

Uh oh!

gemini-code-assist Bot commented Apr 24, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kingpanther13 left a comment

Choose a reason for hiding this comment

Important

Nice-to-have

Strengths

Uh oh!

sergeykad commented Apr 24, 2026

Uh oh!

kingpanther13 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented Apr 24, 2026

🧪 Your changes are now in the dev channel!

Quick start

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants